Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 11.431
Filtrar
1.
BMC Palliat Care ; 23(1): 83, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38556869

RESUMEN

BACKGROUND: Due to limited numbers of palliative care specialists and/or resources, accessing palliative care remains limited in many low and middle-income countries. Data science methods, such as rule-based algorithms and text mining, have potential to improve palliative care by facilitating analysis of electronic healthcare records. This study aimed to develop and evaluate a rule-based algorithm for identifying cancer patients who may benefit from palliative care based on the Thai version of the Supportive and Palliative Care Indicators for a Low-Income Setting (SPICT-LIS) criteria. METHODS: The medical records of 14,363 cancer patients aged 18 years and older, diagnosed between 2016 and 2020 at Songklanagarind Hospital, were analyzed. Two rule-based algorithms, strict and relaxed, were designed to identify key SPICT-LIS indicators in the electronic medical records using tokenization and sentiment analysis. The inter-rater reliability between these two algorithms and palliative care physicians was assessed using percentage agreement and Cohen's kappa coefficient. Additionally, factors associated with patients might be given palliative care as they will benefit from it were examined. RESULTS: The strict rule-based algorithm demonstrated a high degree of accuracy, with 95% agreement and Cohen's kappa coefficient of 0.83. In contrast, the relaxed rule-based algorithm demonstrated a lower agreement (71% agreement and Cohen's kappa of 0.16). Advanced-stage cancer with symptoms such as pain, dyspnea, edema, delirium, xerostomia, and anorexia were identified as significant predictors of potentially benefiting from palliative care. CONCLUSION: The integration of rule-based algorithms with electronic medical records offers a promising method for enhancing the timely and accurate identification of patients with cancer might benefit from palliative care.


Asunto(s)
Neoplasias , Cuidados Paliativos , Humanos , Reproducibilidad de los Resultados , Registros Electrónicos de Salud , Neoplasias/terapia , Minería de Datos , Algoritmos
2.
Zhongguo Zhong Yao Za Zhi ; 49(3): 836-841, 2024 Feb.
Artículo en Chino | MEDLINE | ID: mdl-38621887

RESUMEN

This study aims to construct the element relationship and extension path of clinical evidence knowledge map with Chinese patent medicine, providing basic technical support for the formation and transformation of the evidence chain of Chinese patent medicine and providing collection, induction, and summary schemes for massive and disorganized clinical data. Based on the elements of evidence-based PICOS, the conventional construction methods of knowledge graph were collected and summarized. Firstly, the data entities related to Chinese patent medicine were classified, and entity linking was performed(disambiguation). Secondly, the study associated and classified the attribute information of the data entity. Finally, the logical relationship between entities was constructed, and then the element relationship and extension path of the knowledge map conforming to the characteristics of clinical evidence of Chinese patent medicine were summarized. The construction of the clinical evidence knowledge map of Chinese patent medicine was mainly based on process design and logical structure, and the element relationship of the knowledge map was expressed according to the PICOS principle and evidence level. The extension path crossed three levels(model layer, data layer application, and new evidence application), and the study gradually explored the path from disease, core evaluation indicators, Chinese patent medicine, core prescriptions, syndrome and treatment rules, and medical case comparison(evolution law) to new drug research and development. In this study, the top-level design of the construction of the clinical evidence knowledge map of Chinese patent medicine has been clarified, but it still needs the joint efforts of interdisciplinary disciplines. With the continuous improvement of the map construction technology in line with the characteristics of TCM, the study can provide necessary basic technical support and reference for the development of the TCM discipline.


Asunto(s)
Medicamentos Herbarios Chinos , Medicamentos Herbarios Chinos/uso terapéutico , Medicina Tradicional China , Medicamentos sin Prescripción/uso terapéutico , Tecnología , Minería de Datos/métodos
3.
Food Chem Toxicol ; 187: 114638, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38582341

RESUMEN

With a society increasingly demanding alternative protein food sources, new strategies for evaluating protein safety issues, such as allergenic potential, are needed. Large-scale and systemic studies on allergenic proteins are hindered by the limited and non-harmonized clinical information available for these substances in dedicated databases. A missing key information is that representing the symptomatology of the allergens, especially given in terms of standard vocabularies, that would allow connecting with other biomedical resources to carry out different studies related to human health. In this work, we have generated the first resource with a comprehensive annotation of allergens' symptomatology, using a text-mining approach that extracts significant co-mentions between these entities from the scientific literature (PubMed, ∼36 million abstracts). The method identifies statistically significant co-mentions between the textual descriptions of the two types of entities in the literature as indication of relationship. 1,180 clinical signs extracted from the Human Phenotype Ontology, the Medical Subject Heading terms of PubMed together with other allergen-specific symptoms, were linked to 1,036 unique allergens annotated in two main allergen-related public databases via 14,009 relationships. This novel resource, publicly available through an interactive web interface, could serve as a starting point for future manually curated compilation of allergen symptomatology.


Asunto(s)
Alérgenos , Minería de Datos , Humanos , Minería de Datos/métodos , Bases de Datos Factuales , Proteínas/metabolismo
4.
Sci Rep ; 14(1): 7635, 2024 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-38561391

RESUMEN

Extracting knowledge from hybrid data, comprising both categorical and numerical data, poses significant challenges due to the inherent difficulty in preserving information and practical meanings during the conversion process. To address this challenge, hybrid data processing methods, combining complementary rough sets, have emerged as a promising approach for handling uncertainty. However, selecting an appropriate model and effectively utilizing it in data mining requires a thorough qualitative and quantitative comparison of existing hybrid data processing models. This research aims to contribute to the analysis of hybrid data processing models based on neighborhood rough sets by investigating the inherent relationships among these models. We propose a generic neighborhood rough set-based hybrid model specifically designed for processing hybrid data, thereby enhancing the efficacy of the data mining process without resorting to discretization and avoiding information loss or practical meaning degradation in datasets. The proposed scheme dynamically adapts the threshold value for the neighborhood approximation space according to the characteristics of the given datasets, ensuring optimal performance without sacrificing accuracy. To evaluate the effectiveness of the proposed scheme, we develop a testbed tailored for Parkinson's patients, a domain where hybrid data processing is particularly relevant. The experimental results demonstrate that the proposed scheme consistently outperforms existing schemes in adaptively handling both numerical and categorical data, achieving an impressive accuracy of 95% on the Parkinson's dataset. Overall, this research contributes to advancing hybrid data processing techniques by providing a robust and adaptive solution that addresses the challenges associated with handling hybrid data, particularly in the context of Parkinson's disease analysis.


Asunto(s)
Algoritmos , Enfermedad de Parkinson , Humanos , Minería de Datos/métodos , Incertidumbre
5.
PLoS One ; 19(4): e0300701, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38564591

RESUMEN

Space medicine is a vital discipline with often time-intensive and costly projects and constrained opportunities for studying various elements such as space missions, astronauts, and simulated environments. Moreover, private interests gain increasing influence in this discipline. In scientific disciplines with these features, transparent and rigorous methods are essential. Here, we undertook an evaluation of transparency indicators in publications within the field of space medicine. A meta-epidemiological assessment of PubMed Central Open Access (PMC OA) eligible articles within the field of space medicine was performed for prevalence of code sharing, data sharing, pre-registration, conflicts of interest, and funding. Text mining was performed with the rtransparent text mining algorithms with manual validation of 200 random articles to obtain corrected estimates. Across 1215 included articles, 39 (3%) shared code, 258 (21%) shared data, 10 (1%) were registered, 110 (90%) contained a conflict-of-interest statement, and 1141 (93%) included a funding statement. After manual validation, the corrected estimates for code sharing, data sharing, and registration were 5%, 27%, and 1%, respectively. Data sharing was 32% when limited to original articles and highest in space/parabolic flights (46%). Overall, across space medicine we observed modest rates of data sharing, rare sharing of code and almost non-existent protocol registration. Enhancing transparency in space medicine research is imperative for safeguarding its scientific rigor and reproducibility.


Asunto(s)
Medicina Aeroespacial , Reproducibilidad de los Resultados , Difusión de la Información , PubMed , Minería de Datos
6.
Environ Geochem Health ; 46(5): 146, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38578375

RESUMEN

With the transformation and upgrading of industries, the environmental problems caused by industrial residual contaminated sites are becoming increasingly prominent. Based on actual investigation cases, this study analyzed the soil pollution status of a remaining sites of the copper and zinc rolling industry, and found that the pollutants exceeding the screening values included Cu, Ni, Zn, Pb, total petroleum hydrocarbons and 6 polycyclic aromatic hydrocarbon monomers. Based on traditional analysis methods such as the correlation coefficient and spatial distribution, combined with machine learning methods such as SOM + K-means, it is inferred that the heavy metal Zn/Pb may be mainly related to the production history of zinc rolling. Cu/Ni may be mainly originated from the production history of copper rolling. PAHs are mainly due to the incomplete combustion of fossil fuels in the melting equipment. TPH pollution is speculated to be related to oil leakage during the industrial use period and later period of vehicle parking. The results showed that traditional analysis methods can quickly identify the correlation between site pollutants, while SOM + K-means machine learning methods can further effectively extract complex hidden relationships in data and achieve in-depth mining of site monitoring data.


Asunto(s)
Contaminantes Ambientales , Metales Pesados , Hidrocarburos Policíclicos Aromáticos , Contaminantes del Suelo , Cobre/análisis , Hidrocarburos Policíclicos Aromáticos/análisis , Plomo/análisis , Contaminantes del Suelo/análisis , Metales Pesados/análisis , Zinc/análisis , Contaminación Ambiental/análisis , Suelo , Contaminantes Ambientales/análisis , Minería de Datos , Monitoreo del Ambiente/métodos , China , Medición de Riesgo
7.
Sci Rep ; 14(1): 8595, 2024 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-38615084

RESUMEN

The COVID-19 pandemic has profoundly reshaped human life. The development of COVID-19 vaccines has offered a semblance of normalcy. However, obstacles to vaccination have led to substantial loss of life and economic burdens. In this study, we analyze data from a prominent health insurance provider in the United States to uncover the underlying reasons behind the inability, refusal, or hesitancy to receive vaccinations. Our research proposes a methodology for pinpointing affected population groups and suggests strategies to mitigate vaccination barriers and hesitations. Furthermore, we estimate potential cost savings resulting from the implementation of these strategies. To achieve our objectives, we employed Bayesian data mining methods to streamline data dimensions and identify significant variables (features) influencing vaccination decisions. Comparative analysis reveals that the Bayesian method outperforms cutting-edge alternatives, demonstrating superior performance.


Asunto(s)
COVID-19 , Humanos , Teorema de Bayes , COVID-19/epidemiología , COVID-19/prevención & control , Vacunas contra la COVID-19 , Pandemias , Minería de Datos , Vacunación
8.
Ren Fail ; 46(1): 2337285, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38616180

RESUMEN

More than half of the world population lives in Asia and hypertension (HTN) is the most prevalent risk factor found in Asia. There are numerous articles published about HTN in Eastern Mediterranean Region (EMRO) and artificial intelligence (AI) methods can analyze articles and extract top trends in each country. Present analysis uses Latent Dirichlet allocation (LDA) as an algorithm of topic modeling (TM) in text mining, to obtain subjective topic-word distribution from the 2790 studies over the EMRO. The period of checked studied is last 12 years and results of LDA analyses show that HTN researches published in EMRO discuss on changes in BP and the factors affecting it. Among the countries in the region, most of these articles are related to I.R Iran and Egypt, which have an increasing trend from 2017 to 2018 and reached the highest level in 2021. Meanwhile, Iraq and Lebanon have been conducting research since 2010. The EMRO word cloud illustrates 'BMI', 'mortality', 'age', and 'meal', which represent important indicators, dangerous outcomes of high BP, and gender of HTN patients in EMRO, respectively.


Asunto(s)
Inteligencia Artificial , Hipertensión , Humanos , Minería de Datos , Algoritmos , Asia/epidemiología , Hipertensión/epidemiología
9.
Syst Rev ; 13(1): 107, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622611

RESUMEN

BACKGROUND: Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model. METHODS: Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy. RESULTS: The trained models' micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF. The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI: 56.80-78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions (AAGR = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%). CONCLUSIONS: BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.


Asunto(s)
Investigación Biomédica , Procesamiento de Lenguaje Natural , Humanos , Lenguaje , Minería de Datos , Registros Electrónicos de Salud
10.
Commun Biol ; 7(1): 482, 2024 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-38643247

RESUMEN

Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .


Asunto(s)
Investigación Biomédica , Minería de Datos , Animales , Programas Informáticos , Bases de Datos Factuales , Regulación de la Expresión Génica , Mamíferos
11.
BMC Med Inform Decis Mak ; 24(Suppl 3): 98, 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38632621

RESUMEN

BACKGROUND: Tremendous research efforts have been made in the Alzheimer's disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. METHODS: In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. RESULTS: By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. CONCLUSION: The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users' interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.


Asunto(s)
Enfermedad de Alzheimer , Enfermedades Neurodegenerativas , Animales , Ratones , Minería de Datos/métodos , PubMed , Bases de Datos Factuales
12.
Methods Mol Biol ; 2787: 3-38, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38656479

RESUMEN

In this chapter, we explore the application of high-throughput crop phenotyping facilities for phenotype data acquisition and the extraction of significant information from the collected data through image processing and data mining methods. Additionally, the construction and outlook of crop phenotype databases are introduced and the need for global cooperation and data sharing is emphasized. High-throughput crop phenotyping significantly improves accuracy and efficiency compared to traditional measurements, making significant contributions to overcoming bottlenecks in the phenotyping field and advancing crop genetics.


Asunto(s)
Productos Agrícolas , Minería de Datos , Procesamiento de Imagen Asistido por Computador , Fenotipo , Productos Agrícolas/genética , Productos Agrícolas/crecimiento & desarrollo , Minería de Datos/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Manejo de Datos/métodos , Ensayos Analíticos de Alto Rendimiento/métodos
13.
J Med Syst ; 48(1): 47, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38662184

RESUMEN

Ontologies serve as comprehensive frameworks for organizing domain-specific knowledge, offering significant benefits for managing clinical data. This study presents the development of the Fall Risk Management Ontology (FRMO), designed to enhance clinical text mining, facilitate integration and interoperability between disparate data sources, and streamline clinical data analysis. By representing major entities within the fall risk management domain, the FRMO supports the unification of clinical language and decision-making processes, ultimately contributing to the prevention of falls among older adults. We used Ontology Web Language (OWL) to build the FRMO in Protégé. Of the seven steps of the Stanford approach, six steps were utilized in the development of the FRMO: (1) defining the domain and scope of the ontology, (2) reusing existing ontologies when possible, (3) enumerating ontology terms, (4) specifying the classes and their hierarchy, (5) defining the properties of the classes, and (6) defining the facets of the properties. We evaluated the FRMO using four main criteria: consistency, completeness, accuracy, and clarity. The developed ontology comprises 890 classes arranged in a hierarchical structure, including six top-level classes with a total of 43 object properties and 28 data properties. FRMO is the first comprehensively described semantic ontology for fall risk management. Healthcare providers can use the ontology as the basis of clinical decision technology for managing falls among older adults.


Asunto(s)
Accidentes por Caídas , Minería de Datos , Gestión de Riesgos , Accidentes por Caídas/prevención & control , Humanos , Minería de Datos/métodos , Ontologías Biológicas , Registros Electrónicos de Salud/organización & administración , Semántica
14.
J Med Internet Res ; 26: e53375, 2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38568723

RESUMEN

BACKGROUND: The initiation of clinical trials for messenger RNA (mRNA) HIV vaccines in early 2022 revived public discussion on HIV vaccines after 3 decades of unsuccessful research. These trials followed the success of mRNA technology in COVID-19 vaccines but unfolded amid intense vaccine debates during the COVID-19 pandemic. It is crucial to gain insights into public discourse and reactions about potential new vaccines, and social media platforms such as X (formerly known as Twitter) provide important channels. OBJECTIVE: Drawing from infodemiology and infoveillance research, this study investigated the patterns of public discourse and message-level drivers of user reactions on X regarding HIV vaccines by analyzing posts using machine learning algorithms. We examined how users used different post types to contribute to topics and valence and how these topics and valence influenced like and repost counts. In addition, the study identified salient aspects of HIV vaccines related to COVID-19 and prominent anti-HIV vaccine conspiracy theories through manual coding. METHODS: We collected 36,424 English-language original posts about HIV vaccines on the X platform from January 1, 2022, to December 31, 2022. We used topic modeling and sentiment analysis to uncover latent topics and valence, which were subsequently analyzed across post types in cross-tabulation analyses and integrated into linear regression models to predict user reactions, specifically likes and reposts. Furthermore, we manually coded the 1000 most engaged posts about HIV and COVID-19 to uncover salient aspects of HIV vaccines related to COVID-19 and the 1000 most engaged negative posts to identify prominent anti-HIV vaccine conspiracy theories. RESULTS: Topic modeling revealed 3 topics: HIV and COVID-19, mRNA HIV vaccine trials, and HIV vaccine and immunity. HIV and COVID-19 underscored the connections between HIV vaccines and COVID-19 vaccines, as evidenced by subtopics about their reciprocal impact on development and various comparisons. The overall valence of the posts was marginally positive. Compared to self-composed posts initiating new conversations, there was a higher proportion of HIV and COVID-19-related and negative posts among quote posts and replies, which contribute to existing conversations. The topic of mRNA HIV vaccine trials, most evident in self-composed posts, increased repost counts. Positive valence increased like and repost counts. Prominent anti-HIV vaccine conspiracy theories often falsely linked HIV vaccines to concurrent COVID-19 and other HIV-related events. CONCLUSIONS: The results highlight COVID-19 as a significant context for public discourse and reactions regarding HIV vaccines from both positive and negative perspectives. The success of mRNA COVID-19 vaccines shed a positive light on HIV vaccines. However, COVID-19 also situated HIV vaccines in a negative context, as observed in some anti-HIV vaccine conspiracy theories misleadingly connecting HIV vaccines with COVID-19. These findings have implications for public health communication strategies concerning HIV vaccines.


Asunto(s)
Vacunas contra el SIDA , COVID-19 , Infecciones por VIH , Humanos , Vacunas contra la COVID-19 , Pandemias , Minería de Datos , COVID-19/epidemiología , COVID-19/prevención & control , ARN Mensajero , Infecciones por VIH/prevención & control
15.
Artif Intell Med ; 151: 102847, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38658131

RESUMEN

Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children's hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.


Asunto(s)
Procesamiento de Lenguaje Natural , Sistema de Registros , Humanos , Registros Electrónicos de Salud , Minería de Datos/métodos
16.
Sci Rep ; 14(1): 6403, 2024 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-38493251

RESUMEN

Chinese patent medicine (CPM) is a typical type of traditional Chinese medicine (TCM) preparation that uses Chinese herbs as raw materials and is an important means of treating diseases in TCM. Chinese patent medicine instructions (CPMI) serve as a guide for patients to use drugs safely and effectively. In this study, we apply a pre-trained language model to the domain of CPM. We have meticulously assembled, processed, and released the first CPMI dataset and fine-tuned the ChatGLM-6B base model, resulting in the development of CPMI-ChatGLM. We employed consumer-grade graphics cards for parameter-efficient fine-tuning and investigated the impact of LoRA and P-Tuning v2, as well as different data scales and instruction data settings on model performance. We evaluated CPMI-ChatGLM using BLEU, ROUGE, and BARTScore metrics. Our model achieved scores of 0.7641, 0.8188, 0.7738, 0.8107, and - 2.4786 on the BLEU-4, ROUGE-1, ROUGE-2, ROUGE-L and BARTScore metrics, respectively. In comparison experiments and human evaluation with four large language models of similar parameter scales, CPMI-ChatGLM demonstrated state-of-the-art performance. CPMI-ChatGLM demonstrates commendable proficiency in CPM recommendations, making it a promising tool for auxiliary diagnosis and treatment. Furthermore, the various attributes in the CPMI dataset can be used for data mining and analysis, providing practical application value and research significance.


Asunto(s)
Medicamentos Herbarios Chinos , Medicamentos sin Prescripción , Humanos , Medicina Tradicional China/métodos , Minería de Datos , Medicamentos Herbarios Chinos/uso terapéutico
17.
Comput Biol Med ; 172: 108233, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38452471

RESUMEN

BACKGROUND: Cancer cachexia is a severe metabolic syndrome marked by skeletal muscle atrophy. A successful clinical intervention for cancer cachexia is currently lacking. The study of cachexia mechanisms is largely based on preclinical animal models and the availability of high-throughput transcriptomic datasets of cachectic mouse muscles is increasing through the extensive use of next generation sequencing technologies. METHODS: Cachectic mouse muscle transcriptomic datasets of ten different studies were combined and mined by seven attribute weighting models, which analysed both categorical variables and numerical variables. The transcriptomic signature of cancer cachexia was identified by attribute weighting algorithms and was used to evaluate the performance of eleven pattern discovery models. The signature was employed to find the best combination of drugs (drug repurposing) for developing cancer cachexia treatment strategies, as well as to evaluate currently used cachexia drugs by literature mining. RESULTS: Attribute weighting algorithms ranked 26 genes as the transcriptomic signature of muscle from mice with cancer cachexia. Deep Learning and Random Forest models performed better in differentiating cancer cachexia cases based on muscle transcriptomic data. Literature mining revealed that a combination of melatonin and infliximab has negative interactions with 2 key genes (Rorc and Fbxo32) upregulated in the transcriptomic signature of cancer cachexia in muscle. CONCLUSIONS: The integration of machine learning, meta-analysis and literature mining was found to be an efficient approach to identifying a robust transcriptomic signature for cancer cachexia, with implications for improving clinical diagnosis and management of this condition.


Asunto(s)
Caquexia , Neoplasias , Animales , Ratones , Caquexia/genética , Caquexia/metabolismo , Minería de Datos , Perfilación de la Expresión Génica , Aprendizaje Automático , Metaanálisis como Asunto , Músculo Esquelético , Neoplasias/complicaciones , Neoplasias/genética , Neoplasias/metabolismo
18.
Front Endocrinol (Lausanne) ; 15: 1273265, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38469137

RESUMEN

Objective: The specific benefit and selection of acupoints in acupuncture for diabetic kidney disease (DKD) remains controversial. This study aims to explore the specific benefits and acupoints selection of acupuncture for DKD through meta-analysis and data mining. Methods: Clinical trials of acupuncture for DKD were searched in eight common databases. Meta-analysis was used to evaluate its efficacy and safety, and data mining was used to explore its acupoints selection. Results: Meta-analysis displayed that compared with the conventional drug group, the combined acupuncture group significantly increased the clinical effective rate (risk ratio [RR] 1.35, 95% confidence interval [CI] 1.20 to 1.51, P < 0.00001) and high-density lipoprotein cholesterol (mean difference [MD] 0.36, 95% CI 0.27 to 0.46, P < 0.00001), significantly reduced the urinary albumin (MD -0.39, 95% CI -0.42 to -0.36, P < 0.00001), urinary microalbumin (MD -32.63, 95% CI -42.47 to -22.79, P < 0.00001), urine ß2-microglobulin (MD -0.45, 95% CI -0.66 to -0.24, P < 0.0001), serum creatinine (MD -15.36, 95% CI -21.69 to -9.03, P < 0.00001), glycated hemoglobin A1c (MD -0.69, 95% CI -1.18 to -0.19, P = 0.006), fasting blood glucose (MD -0.86, 95% CI -0.90 to -0.82, P < 0.00001), 2h postprandial plasma glucose (MD -0.87, 95% CI -0.92 to -0.82, P < 0.00001), total cholesterol (MD -1.23, 95% CI -2.05 to -0.40, P = 0.003), triglyceride (MD -0.69, 95% CI -1.23 to -0.15, P = 0.01), while adverse events were comparable. Data mining revealed that CV12, SP8, SP10, ST36, SP6, BL20, BL23, and SP9 were the core acupoints for DKD treated by acupuncture. Conclusion: Acupuncture improved clinical symptoms, renal function indices such as uALB, umALB, uß2-MG, and SCR, as well as blood glucose and blood lipid in patients with DKD, and has a favorable safety profile. CV12, SP8, SP10, ST36, SP6, BL20, BL23, and SP9 are the core acupoints for acupuncture in DKD, and this program is expected to become a supplementary treatment for DKD.


Asunto(s)
Terapia por Acupuntura , Diabetes Mellitus , Nefropatías Diabéticas , Humanos , Glucemia , Colesterol , Minería de Datos , Nefropatías Diabéticas/tratamiento farmacológico , Ensayos Clínicos como Asunto
19.
BMC Bioinformatics ; 25(1): 101, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38448845

RESUMEN

PURPOSE: The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS: GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS: The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION: GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.


Asunto(s)
Manejo de Datos , Minería de Datos , Bases de Datos Factuales , Descubrimiento del Conocimiento , PubMed
20.
Medicine (Baltimore) ; 103(12): e37107, 2024 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-38518013

RESUMEN

BACKGROUND: Acupuncture is widely used in the treatment of tinnitus worldwide because of its good efficacy and safety. However, the criteria for selecting acupoint prescriptions and combinations have not been summarized. Therefore, data mining was used herein to determine the treatment principles and the most effective acupoint selection for the treatment of idiopathic tinnitus. METHODS: The clinical research literature of acupuncture in the treatment of idiopathic tinnitus from the establishment of the database to September 1, 2023 in China National Knowledge Infrastructure, China Medical Journal Full-text Database, PubMed, Embase, Cochrane Library and Web of Science databases was retrieved and extracted. Microsoft Excel 2016 was used to establish the acupoint prescription database and the frequency statistics of acupoints, meridians and specific acupoints were carried out. IBM SPSS Statistics 25.0 software was used for cluster analysis of acupoints, and IBM SPSS Modeler18.0 software was used for association rule analysis of acupoints. RESULTS: A total of 112 articles were included, involving 221 acupuncture prescriptions, including 99 acupoints, with a total frequency of 1786 times. The 5 most frequently used acupoints were Tinggong (SI19), Tinghui (GB2), Yifeng (TE17), Ermen (TE21), and Zhongzhu (TE3). The commonly used meridians were Sanjiao meridian of hand-shaoyang, Gallbladder meridian of foot-shaoyang and Small intestine meridian of hand-taiyang. The specific points are mostly Crossing point, Five-shu point and Yuan-primary point. The core acupoint combination of association rules was Ermen (TE21)-Tinggong (SI19)-Tinghui (GB2)-Yifeng (TE17), and 3 effective clustering groups were obtained by cluster analysis of high-frequency acupoints. CONCLUSION: In this study, the published literature on acupuncture treatment of idiopathic tinnitus was analyzed by data mining, and the relationship between acupoints was explored, which provided a more wise choice for clinical acupuncture treatment of idiopathic tinnitus.


Asunto(s)
Terapia por Acupuntura , Meridianos , Acúfeno , Humanos , Puntos de Acupuntura , Acúfeno/terapia , Minería de Datos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...